Learning Accurate, Compact, and Interpretable Tree Annotation

نویسندگان

  • Slav Petrov
  • Leon Barrett
  • Romain Thibaux
  • Dan Klein
چکیده

We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple Xbar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In contrast with previous work, we are able to split various terminals to different degrees, as appropriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more accurate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2% on the Penn Treebank, higher than fully lexicalized systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Minimization within Decision Trees

Decision trees for classification can be constructed using mathematical programming. Within decision tree algorithms, the feature minimization problem is to construct accurate decisions using as few features or attributes within each decision as possible. Feature minimization is an important aspect of data mining since it helps identify what attributes are important and helps produce accurate a...

متن کامل

Efficient and Interpretable Prediction of Protein Functional Classes by Correspondence Analysis and Compact Set Relations

Predicting protein functional classes such as localization sites and modifications plays a crucial role in function annotation. Given a tremendous amount of sequence data yielded from high-throughput sequencing experiments, the need of efficient and interpretable prediction strategies has been rapidly amplified. Our previous approach for subcellular localization prediction, PSLDoc, archives hig...

متن کامل

A NOTE TO INTERPRETABLE FUZZY MODELS AND THEIR LEARNING

In this paper we turn the attention to a well developed theory of fuzzy/lin-guis-tic models that are interpretable and, moreover, can be learned from the data.We present four different situations demonstrating both interpretability as well as learning abilities of these models.

متن کامل

MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine

Machine learning algorithms that are both interpretable and accurate are essential in applications such as medicine where errors can have a dire consequence. Unfortunately, there is currently a tradeoff between accuracy and interpretability among state-of-the-art methods. Decision trees are interpretable and are therefore used extensively throughout medicine for stratifying patients. Current de...

متن کامل

Interpretable Classification Models for Recidivism Prediction

We investigate a long-debated question, which is how to create predictive models of recidivism that are sufficiently accurate, transparent, and interpretable to use for decision-making. This question is complicated as these models are used to support different decisions, from sentencing, to determining release on probation, to allocating preventative social services. Each case might have an obj...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006